Video coding and decoding

ABSTRACT

A video coding and decoding method, wherein a picture is first divided into sub-pictures corresponding to one or more subjectively important picture regions and to a background region sub-picture, which remains after the other sub-pictures are removed from the picture. The sub-pictures are formed to conform to predetermined allowable groups of video coding macroblocks (MBs). The allowable groups of MBs can be, for example, of rectangular shape. The picture is then divided into slices so that each sub-picture is encoded independent of other sub-pictures except for the background region sub-picture, which may be coded using another sub-pictures. The slices of the background sub-picture are formed in a scan-order with skipping over MBs that belong to another sub-picture. The background sub-picture is only decoded if all the positions and sizes of all other sub-pictures can be reconstructed on decoding the picture.

FIELD OF THE INVENTION

This invention relates to video coding and decoding. It relatesparticularly, but not exclusively, to video coding and transmission overerror-prone data connections.

BACKGROUND OF THE INVENTION

Video transmission requires coding of the video in a form that allowsits transmission. Typically, this involves effective compression due tothe vast amount of information contained in a stream of pictures thatconstitute a video to be transmitted.

ITU-T H.263 is an International Telecommunications Union (ITU) videocoding recommendation which specifies the bit-stream syntax and thedecoding of a bit-stream. In this standard, pictures are coded usingluminance and two colour difference (chrominance) components (Y, CB andCR). The chrominance components are each sampled at half resolutionalong both co-ordinate axes compared to the luminance component.

Each coded picture, as well as the corresponding coded bit stream, isarranged in a hierarchical structure with four layers being, from top tobottom, a picture layer, a picture segment layer, a macroblock (MB)layer and a block layer. The picture segment layer can be either a groupof blocks layer or a slice layer.

The picture layer data contains parameters affecting the whole picturearea and the decoding of the picture data. By default, each picture isdivided into groups of blocks. A group of blocks (GOB) typicallycomprises a row of macroblocks (16 subsequential pixel lines) or amultiple thereof. Data for each GOB consist of an optional GOB headerfollowed by data for MBs. Alternatively to GOBs, so called slices can beused, whereby each picture is divided into slices instead of GOBs. Datafor each slice consists of a slice header followed by data for MBs.

The slices define regions within a coded picture. Each region is anumber of MBs in a normal scanning order. There are no predictiondependencies across slice boundaries within the same coded picture.However, temporal prediction can generally cross slice boundaries unlessITU-T H.263 Annex R (Independent Segment Decoding) is used. Slices canbe decoded independently from the rest of the picture data (except forthe picture header). Consequently, slices improve error resilience inpacket-lossy networks.

Each GOB or slice is divided into MBs. An MB relates to 16×16 pixels ofluminance data and the spatially corresponding 8×8 pixels of chrominancedata. In other words, an MB consists of four 8×8 luminance blocks andtwo spatially corresponding 8×8 chrominance blocks.

Rather than using regions formed of a number of MBs in the normal scanorder, rectangular regions consisting of N×M macroblocks (N, M greaterthan or equal to one) and substituting slice and GOB structures wereproposed to the ITU-T H.263 by Sen-ching Cheung, “Proposal on usingRegion Layer in H.263+”, ITU-T SG15 WP1 document LBC-96-213, July 1996.However, the proposal was not adopted for H.263.

In ITU-T H.263 Independent Segment Decoding mode (ITU-T H.263 Annex R),segment boundaries (as defined by the boundaries of the slices or theupper boundaries of the GOBs for which GOB headers are sent, or theboundaries of the picture, whichever bounds a region in the smallestway) are treated similarly to picture boundaries, which eliminate allerror propagation from neighboring slices. For example, errors cannot bepropagated due to motion compensation or de-blocking loop filtering fromneighboring slices. Segment boundaries can only be changed at INTRApictures, i.e. when no inter-coding is required.

The ISO/IEC standard draft 14496-2:1999(E), referred to as MPEG-4 visualor MPEG-4 video, is a standard draft that has a design centered around abasic unit of content called an audio-visual object (AVO). Examples ofAVO's are a musician (in motion) in an orchestra, the sound generated bythat musician, the chair she is sitting on, the (possibly moving)background behind the orchestra, and explanatory text for the currentpassage. In the MPEG-4 video, each AVO is represented separately andbecomes the basis for an independent stream.

The coding of natural two-dimensional motion video is a part of theMPEG-4 video. MPEG-4 video is capable of coding both conventionalrectangular video objects as well as arbitrarily shaped two-dimensionalvideo objects The basic video AVO is called a video object (VO). The VOscan be scalable, i.e. they may be split up, coded, and sent in two ormore video object layers (VOL). One of these VOLs is called the baselayer, which all terminals must receive in order to display any kind ofvideo. The remaining VOLs are called enhancement layers, which may beexpendable in case of transmission errors or restricted transmissioncapacity. In case of non-scalable video coding, one VOL per VO is coded.

A snapshot in time of a video object layer is called a video objectplane (VOP). For a rectangular video, this corresponds to a picture or aframe. However, in general, the VOPs can have an arbitrary shape. EachVOP can be divided into video packets. Each VOP and video packet isfurther divided into macroblocks similarly to ITU-T H.263. The colour(YUV) information of the macroblock is coded similarly to ITU-T H.263,i.e., the macroblock is further divided into 8×8 blocks. In addition, ifthe VOP has an arbitrary shape, the shape of the macroblock is coded asexplained in the next paragraph.

The MPEG-4 video VOs may be of any shape, and furthermore the shape,size, and position of the object may vary from one frame to the next. Interms of its general representation, a video object is composed of threecolour components (YUV) and an alpha component. The alpha componentdefines the object's shape on a picture-by-picture basis. Binary objectsform the simplest class of objects. They are represented by a sequenceof binary alpha maps, i.e. 2-dimensional pictures where each pixel iseither black or white. MPEG-4 video provides a binary shape only modefor compressing these objects. The compression process is definedexclusively by a binary shape encoder for coding the sequence of alphamaps. In addition to binary objects, a grey-level alpha map can be usedto define the opacity of the object. The object boundary is coded usinga binary alpha map, while the grey-level alpha information is codedsimilarly to texture coding using the DCT transform. In addition to thesequence of object shape and opacity definitions, the representationcomprises the colours of all the pixels within the interior of theobject shape. MPEG-4 video encodes these objects using a binary shapeencoder and then a motion compensated discrete cosine transform(DCT)-based algorithm for the interior texture coding.

It is also known to be advantageous to segment a video bit-stream intoportions of different priorities, for example by scalable video coding,data partitioning, or region-based coding discussed above.

Scalable video coding and data partitioning suffer, however, fromdependencies between different coding elements. An enhancement layer,for example, cannot be decoded correctly if the base layer has not beenreceived correctly. Correspondingly, a low-priority partition is of nouse if the corresponding high-priority partition has not been received.This makes the use of scalable video coding and data partitioningdisadvantageous in some cases. Scalable coding and data partitioning donot provide means to handle spatial regions of interest differently fromsubjectively less important areas. Moreover, many forms of scalablecoding, such as conventional signal to noise ratio (SNR) and spatialscalability, suffer from a worse compression efficiency compared tonon-scalable coding. In the region-based video coding, on the otherhand, the GOBs or slices may contain macroblocks of different subjectiveimportance. Thus, no prioritization of GOBs and slices is typicallypossible.

Coding of arbitrarily shaped objects is currently considered too complexfor handheld devices. This is further exemplified by the fact thatMPEG-4 video shape coding tools are typically excluded from mobile videocommunication services of the planned third generation mobiletelephones.

SUMMARY OF THE INVENTION

It is an object of the invention to provide an alternative suitable formobile communication which yet provides at least some of the advantagessimilar to those offered by MPEG-4 video.

According to a first aspect of the invention there is provided a methodof video encoding comprising the steps of:

-   -   dividing a picture into a set of regular shaped coding blocks        having a predetermined alignment in relation to the area of the        picture, each coding block corresponding to at least one group        of elementary coding elements;    -   determining at least one shape within a picture;    -   selecting at least one subset of the coding blocks defining at        least one area covering the at least one determined shape;    -   determining as at least one separate coding object the selected        at least one subset of the coding blocks;    -   determining as a background object the part of the picture that        excludes the at least one separate coding object;    -   encoding the at least one separate coding object; and    -   encoding as one coding object the background object.

It is an advantage of the invention that a background coding object canbe determined as a unitary coding object that is defined as the part ofthe picture that does not belong to any separate coding object and thatthe separate coding objects need not conform to the shapes which theycover.

Preferably, the background coding object is coded using the at least oneseparate coding object.

The background object cannot be reconstructed without determination ofthe position, shape and size of each separate coding object. If any datapacket carrying a separate coding object is lost, there is no chance todecode the background coding object anyway. The determination of theposition and size of the at least one separate coding object indicatesthe presence of video data of the at least one separate coding object.There is thus a high likelihood to successful prediction of a backgroundcoding object using the at least one separate coding object, so that itis typically reasonable to encode the background coding object using theat least one separate coding object.

Preferably, the video encoding the background coding object furthercomprises the sub-step of defining coding slices in a scan-order so thatthe slices are composed by consecutive coding blocks skipping thosebasic coding objects which are included in the at least one separatecoding object.

Preferably, the scan-order is scanning first one horizontal line andthen vertically proceeding to a next horizontal line. Alternatively, thescan order is scanning first one vertical line and then horizontallyproceeding to a next vertical line. Yet alternatively, any other scanorder may be used.

Preferably, the video encoding the at least one separate coding objectfurther comprises the sub-step of defining within each separate codingobject coding slices in a scan-order so that the slices are composed inthe scan-order of coding blocks included in the at least one separatecoding object.

It is an advantage of the invention that objects of high subjectiveinterest can be video encoded separately from the background withreduced computational requirements, as the area defined for a shapeconforms to the predetermined alignment of the coding blocks.

Preferably, the coding blocks are macroblocks.

Preferably, the area covering the at least one determined shape is arectangular area, whereby square is one subset of rectangles.

Preferably the separate coding objects are defined in a descending orderof subjective importance.

Preferably, a subjectively less important separate coding objectentirely excludes the coding blocks that define the area covering the atleast one determined shape corresponding to a subjectively moreimportant separate coding object. This allows automatic clipping ofoverlapping corners of a rectangular area defined by a subjectively lessimportant coding object in case they would otherwise overlap with anyarea defined by a subjectively more important area.

Preferably, the video encoding of the at least one separate codingobject is independent of the video encoding of the background object soas to inhibit error propagation into the at least one separate codingobject.

The use independent video encoding of the at least one separate codingobject enhances the robustness of the video encoding, although then theposition of the at least one separate coding object cannot be changedwithout sending an intra-picture that is not based on earlier pictures.

Alternatively, the video encoding of the at least one separate codingobject is allowed to depend on the video encoding of the backgroundobject and on any other of the at least one separate coding object.

This embodiment basically causes a sub-picture boundary of the at leastone separate coding object be treated as a slice boundary. The positionand size of the at least one separate coding object may then be changedeven if the at least one separate coding object is being inter-coded.

Preferably, the video encoding of the background object is allowed touse the at least one separate coding object so as to enhance videocompression efficiency.

Preferably, the method further comprises the step of determininginformation characterising the position and size of the at least oneseparate coding object for use in decoding the picture.

Preferably, the step of determining information characterising the sizeof the at least one separate coding object comprises the sub-step ofcomputing a reference width based on the width of the picture andexpressing the width of the at least one separate coding object usingthe reference width.

Preferably, the step of determining information characterising the sizeof the at least one separate coding object comprises the sub-step ofcomputing a reference height based on the height of the picture andexpressing the height of the at least one separate coding object usingthe reference height.

Preferably, the method further comprises the step of characterising thetype of each of the at least one separate coding object for use indecoding the picture.

Preferably, the method further comprises the step of assigning adifferent identifier to the at least one separate coding object forcorrelating each of the at least one separate coding object andcorresponding characteristics.

Preferably, the video encoding of the at least one separate codingobject uses a higher quantisation step density than the video encodingof the background object.

Preferably, the method further comprises the step of error protectingthe at least one separate coding object against data corruption.

Preferably, the method further comprises the step of error protectingthe background object against data corruption.

Preferably, the at least one separate coding object is more errorprotected against data corruption than the background object.

Preferably, unequal error protection is used to prioritize data packetscontaining information related to the at least one separate codingobject.

Preferably, the determining at least on shape within a picture is basedon its appearance.

Alternatively, the determining at least one shape within a picture isbased on choosing uniform motion fields.

According to a second aspect of the invention there is provided a methodof video decoding a picture coded by a set of coding blocks, each codingblock corresponding to at least one group of the elementary codingelements and the coding blocks having a predetermined alignment inrelation to the area of the picture, the method comprising the steps of:

-   -   determining at least one separate coding object corresponding to        at least one subset of the coding blocks defining at least one        part of a picture being decoded;    -   determining as a background object the subset of the coding        blocks that corresponds to the part of the picture that excludes        the at least one separate coding object;    -   decoding the at least one separate coding object; and    -   decoding the background object.

Preferably, the method further comprises determining video decodingslices for the background object, comprising the sub-steps of forming adecoding slice of consecutive coding blocks and skipping the codingblocks which belong to the at least one separate coding object.

Preferably, each of the at least one subset of coding blocks define arectangular sub-picture, whereby square is a sub-set of rectangles.

Preferably, the coding blocks are macroblocks.

Preferably, the video decoding of the at least one separate codingobject is independent of the video decoding of the background object.

It is an advantage of the method that it may be used for variousapplications such as for prioritized transportation of subjectivelyimportant regions. In addition, it allows “picture resolutionscalability”, i.e. the picture can be scaled to fit onto a displayhaving a resolution too small for the full picture, by decoding only aseparate coding object of a suitable size.

Preferably, the video decoding of the background object is allowed touse the at least one separate coding object. Even more preferably, thebackground object is predicted spatially, parametrically, and/ortemporally from the at least one separate object to make processingsimpler.

Preferably, the at least one separate object corresponds to at least oneforeground region sub-picture.

It is an advantage of the prediction based on at least one separateobject that the background objects often are subjectively of limitedsignificance. The information of the at least one separate object canthus be used so as to further enhance the video compression, as possibleerror propagation from foreground region sub-pictures to the backgroundobject may not degrade the subjective picture quality excessively.

Preferably, the method further comprises the step of determining theposition and size of the at least one separate coding object.

Preferably, the step of determining the size of the at least oneseparate coding object comprises the sub-step of computing a referencewidth based on the width of the picture and determining the width of theat least one separate coding object using the reference width.

Preferably, the step of determining the size of the at least oneseparate coding object comprises the sub-step of computing a referenceheight based on the height of the picture and determining the height ofthe at least one separate coding object using the reference height.

Preferably, the method further comprises the step of determining thetype of each of the at least one separate coding object.

Preferably, the video decoding of the at least one separate codingobject uses a higher quantisation step density than the video decodingof the background object.

Preferably, the method further comprises the step of detecting a loss ofthe at least one separate coding object.

Preferably, the method of detecting a loss of the at least one separatecoding object is based on enumeration of the separate coding objectswith a pre-defined value for the first at least one separate codingobject and with a pre-defined increment or decrement from one separatecoding object to another.

A lack of any expected object number then allows a decoder to detect aloss of the corresponding separate coding object.

Preferably, the method further comprises decoding the at least oneseparate coding object separately from other of other coding objects.

Preferably, the method further comprises the step of error correctiondecoding the at least one separate coding object.

Preferably, the method further comprises the step of error correctiondecoding the background object.

It is an advantage of prioritizing subjectively most important parts ofthe video bit-stream that a better subjective picture quality can bereached compared to equal transport and error protection of all parts ofthe bit-stream.

According to a third aspect of the invention there is provided a videoencoder comprising:

-   -   means for dividing a picture into a set of regular shaped coding        blocks having a predetermined alignment in relation to the area        of the picture, each coding block corresponding to at least one        group of elementary coding elements;    -   means for determining at least one shape within a picture;    -   means for selecting at least one subset of the coding blocks        defining at least one area covering the at least one determined        shape;    -   means for determining as at least one separate coding object the        selected at least one subset of the coding blocks;    -   means for determining as a background object the part of the        picture that excludes the at least one separate coding object;    -   means for encoding the at least one separate coding object; and    -   means for encoding as one coding object the background object.

According to a fourth aspect of the invention there is provided a videodecoder for video decoding a picture coded by a set of coding blocks,each coding block corresponding to at least one group of the elementarycoding elements and the coding blocks having a predetermined alignmentin relation to the area of the picture, the decoder comprising:

-   -   means for determining at least one separate coding object        corresponding to at least one subset of the coding blocks        defining at least one part of a picture being decoded;    -   means for determining as a background object the part of the        picture that excludes the at least one separate coding object;    -   means for decoding the at least one separate coding object;    -   means for decoding the background object.

According to a fifth aspect of the invention there is provided acomputer program product comprising computer executable program meansfor causing an apparatus to implement the method of the first aspect.

According to a sixth aspect of the invention there is provided acomputer program product comprising computer executable program meansfor causing an apparatus to implement the method of the second aspect.

According to a seventh aspect of the invention there is provided anapparatus comprising the video encoder of the third aspect.

According to a eighth aspect of the invention there is provided anapparatus comprising the video decoder of the fourth aspect.

Preferably, the apparatus of the aspects three to eight is selected froma group consisting of: a mobile communication device, a wirelesscommunication device, a gaming device, a video recording device, a videooutput device, a communication network server, a communication networkgateway, a personal computer, a portable computer, and a personaldigital assistant device.

According to a ninth aspect of the invention there is provided a videosignal comprising:

-   -   a plurality of compressed video coding blocks corresponding to        at least one separate coding object corresponding to a part of a        video encoded picture, the part of the video encoded picture        having a variable size and position;    -   at least one identifier corresponding to the at least one        separate coding object;    -   at least one position and size information corresponding to the        at least one separate coding object; and    -   a plurality of compressed video coding blocks corresponding to a        background object that corresponds to a set of coding blocks        that corresponds to the video encoded picture excluding the at        least one separate coding object.

Various embodiments of the present invention have been illustrated onlywith reference to the one aspect of the invention for sake of briefness,but it should be appreciated that corresponding embodiments may apply toother aspects as well.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention will now be described, by way of example only, withreference to the accompanying drawings, in which:

FIG. 1 shows a picture to be encoded in relation to macroblocks definedfor video encoding;

FIG. 2 shows a principal drawing of video coding slices formed for thepicture of FIG. 1 according to a preferred embodiment of the invention;

FIG. 3 shows a flowchart of the video encoding of a picture according tothe preferred embodiment of the invention;

FIG. 4 shows a flowchart of the decoding of a picture according to thepreferred embodiment of the invention;

FIG. 5 shows a flowchart of the decoding of a background regionaccording to the preferred embodiment of the invention in case that allforeground region sub-pictures have not been decoded correctly;

FIG. 6 shows a block diagram of a mobile communication device accordingto the preferred embodiment of the invention; and

FIG. 7 shows a system according to the preferred embodiment of theinvention.

DETAILED DESCRIPTION

A preferred embodiment of the invention may be considered assupplementing the ITU-T H.26L by adding a sub-picture coding layerbetween picture and slice layers. The sub-picture coding layer shallform so-called sub-pictures (SP) which are typically rectangular(foreground region SPs or FR SPs) except for the so-called backgroundregion (BR) SP. The BR SP consists of the picture area not falling toany of the rectangular SPs. All SPs are first coded in scan-order, i.e.the slices start from the SPs and the slices are typically chosenfollowing the order of subjective priorities so that the subjectivelymost important SPs are coded first and the BR SP is coded last. The SPsdo not overlap, i.e. the entire encoded picture consists of all SPs.

FIG. 1 shows a picture 100 to be encoded in relation to macroblocks(MBs) defined for video encoding. The picture comprises a heart (of adrawn animation) that is considered as a foreground object 101 that isof a main interest. A rectangular foreground region sub-picture (FR SP)102 has been drawn around the foreground object along the MB borders.Surrounding the foreground object 101, the picture also has abackground. The portion of the background that surrounds the FR SP 102is referred to as the background region sub-picture 103 (BR SP). Noticethat also part of the background may belong to the FR SP 102, as is thecase here. FIG. 1 also shows the MBs assigned in ascending ordernumbered as 0 to 79 where the first MB (0) is at the left-hand sideupper corner and numbering grows to the right and continues after eachrow from the left of the next row.

FIG. 2 shows a principal drawing of video coding slices formed for thepicture of FIG. 1 according to a preferred embodiment of the invention.The picture is segmented into video coding slices using a slice size of5 macroblocks. The segmentation is started from the most important FR SPand the BR SP is segmented into slices after al the FR SPs (in FIGS. 1and 2 only one FR SP is present). The slices are given running slicenumbers starting from 0. Notice that slice 0 occupies 3 MBs from thefirst row within the FR SP and then 2 MBs of the second row within theFR SP, and particularly that the last slice of the FR SP is closedbefore the BR SP is encoded. The MBs in the BR SP are then segmentedinto slices in the scan-order so that each slice but the last one isgenerally composed of the maximum number of MBs allowed for one slice.The slices simply skip over each FR SP. Larger slices generally resultin smaller amount of redundancy required to encode a picture.

FIG. 3 shows a flowchart of the video encoding process according to thepreferred embodiment of the invention. The flowchart starts from block310, wherein a video encoded picture has been received.

After start, the process continues to block 320, where it is attemptedto find one or more foreground object 101. Block 330 then checks if anyforeground object 101 has been found. If no, block 331 encodes thepicture as a single encoding block and the process ends. If yes, block340 picks the most important foreground object 101 that has not yet beenencoded. Block 350 then determines the smallest possible region ofmacroblocks (FR SP 102) that covers the picked foreground object 101.Typically, the possible regions are limited to those of a predeterminedshape, such as rectangle shapes (including squares), as this shapeprovides simple video coding and decoding that suits well for portabledevices. In alternative embodiments of the invention, different otherpredetermined shapes of the possible regions can be used, provided thata mechanism is agreed for the video encoder to inform the decoder of theshape used.

In block 360, FR SP 102 is then video encoded. After that, block 370checks if there is still a foreground object 101 not yet encoded. Ifyes, the process returns to block 340, otherwise it proceeds to block380. In block 380 the BR SP, i.e. the MBs not belonging to any FR SP, isvideo encoded. The process then ends in block 390.

In an alternative embodiment, Block 350 determines the smallest possibleregion of macroblocks (FR SP 102) that covers the picked foregroundobject 101 in a series of consecutive pictures In a yet anotheralternative embodiment, block 350 determines the smallest possibleregion of macroblocks such that it reserves an amount of room around thepicked foreground object. In even further alternative embodiment, thepossible region of macroblocks is of a predetermined size and/or shape.

FIG. 4 shows a flowchart of the decoding process according to thepreferred embodiment of the invention. The process starts from block410, where encoded video information corresponding to a video encodedpicture has been received. In block 420 the decoder then attempts tofind any encoded FR SPs. In block 430 it is checked if any FR SPs werefound. If no, it is then attempted to decode the picture as a singlecoding object in block 431, otherwise the process continues to block440. Block 440 picks the most important FR SP that has not yet beendecoded. In block 450 the picked FR SP is then decoded and block 460checks if there is still an FR SP not yet decoded. If yes, the processreturns to block 440, otherwise it proceeds to block 470. In block 470,it is checked if all the FR SPs have been correctly decoded. If no, theprocess continues from block A shown in FIG. 5. If yes, the processproceeds to block 480, wherein the BR SP is decoded. After this, theprocess ends in block 490.

FIG. 5 shows a flowchart of the decoding of a BR SP according to thepreferred embodiment of the invention in case that all FR SPs have notbeen decoded correctly. The decoding starts from block 510. In block 520it is then checked whether the position and size of each FR SP is known.In independent sup-picture decoding mode, the position and size ofsub-pictures can be changed only in INTRA pictures (similarly to H.263Independent Segment Decoding). This fact can be used in practicalimplementations. The knowledge of the position and size of each FR SP isimportant for BR SP decoding, since the BR SP can only be determined ifthe position and size of every FR SP are known. If no, the decodercannot decode the BR SP at all and the process ends, otherwise thedecoder proceeds to block 530. In block 530 it is checked if the BR SPhas been encoded using any corrupted FR SP. It may be that the BR SP hasbeen encoded without any reference to the MBs of the FR SP, in whichcase the answer is bound to be no and the process continues to block550. In block 550, the BR SP is decoded. However, if the BR SP has beencoded using any corrupted FR SP, the process continues from block 530 toblock 540, error concealment of the BR SP. Basically, when the positionand size of each FR SP is known, the BR SP can be estimated using aprevious BR SP and/or present FR SP(s). At simplest, the very previousBR SP may be used as such provided all the FR SPs are the same as withthe previous picture with regard to their size and position. In videocoding there is typically much of temporal redundancy which allows thistype of error concealing. Furthermore, the error concealment of the BRSP can often utilise the coded representation of the BR SP for recovery.

The coding process will next be described with more detail. In thepreferred embodiment, two different coding modes can be used for codingthe FR SPs: independent SP coding and normal prediction mode. Inindependent SP coding, boundaries of FR SPs are treated as pictureboundaries. The SP segmentation is static over a group of pictures (orany similar grouping of pictures). Both temporal and spatial predictionover the SP boundaries is prevented when coding the FR SPs, to restrainerror propagation. The BR SP can however be coded allowing temporal andspatial prediction over the BR SP boundaries, as the BR SP is consideredto have a lower subjective importance and it does not need to beprotected against error propagation. For example, when motion vectorsused in motion compensated video encoding do not point outside the FRSPs. Neither spatial prediction nor loop filtering is allowed across theFR SP boundaries.

The BR SP can thus be predicted using the FR SP and the BR SP cannot bedetermined at all if any of data packets characterising the FR SP hasnot been received by the decoder. Consecutively, the decoding of the BRSP need not even be attempted when the size or position of any of the FRSPs cannot be determined, which reduces the power consumption of adecoder. Moreover, since the size and position of the FR SPs are alwaysknown before the BR SP is to be decoded, they can well be used as abasis for encoding the BR SP.

The bit-stream syntax according to the preferred embodiment will next bedescribed.

The use of the FR SPs is signaled in the parameter list of picture andsequence layer data, for example as has been suggested in the ITU-T VCEGdocument VCEG-N72R1, 26 Sep. 2001. The sub-picture feature probablyfalls out of the scope of the baseline profile and belongs to profilesfor error-prone environments only.

When sub-pictures are in use, the slice header is as follows:

-   -   PictureID As defined in the aforementioned VCEG-N72-R1.    -   SliceType As defined in e aforementioned VCEG-N72-R1.    -   FirstMBlnSliceX The horizontal position (column) of the first        macroblock in the slice relative to the sub-picture.    -   FirstMBlnSliceY The vertical position (row) of the first        macroblock in the slice relative to the sub-virtue.    -   InitialQP As defined in the aforementioned VGEG-N2-R11.    -   SubPictureID Unique identifier of the sub-picture. Each        sub-picture is assigned an ID starting from zero and incremented        by one in coding order. The count is reset for each picture. If        independent sub-picture coding is in use, sub-picture ID remains        the same for the spatially matching sub-pictures over a group of        pictures.    -   SubPictureInfo 0: Sub-picture attributes are the same as the        attributes of a sub-picture having the same ID in the previous        picture. This value is useful especially in the independent        sub-picture coding mode.        -   1: Sub-picture having the same ID in the same of a            sub-picture having the same ID in the same picture. This            value is used if a sub-picture contains multiple slices.        -   2: Sub-picture location and size is defined in the following            four codewords. If independent sub-picture coding is in use,            the following four codewords remains the same within a group            of blocks. A repetition of the codewords is allowed for            error resiliency purposes.    -   3: Background sub-picture. If one of the earlier sub-picture for        the same picture is lost and its location and size are not        externally signaled (which is typical in normal prediction        mode), decoder does not decode the background sub-picture, as        its shape is unknown.    -   Left The coordinate of the left-most macroblock in the        sub-picture (in macroblocks). The left-most macroblock column of        the picture is assigned value zero.    -   Top The coordinate of the top-most macroblock in the sub-picture        (in macroblocks). The top-most macroblock row of the picture is        assigned value zero.

Width Width of the sub-picture. The codewords are assigned as follows:Symbol no UVLC code Explanation 0 1 Guess = (RightMost − Left)/2 + 1,where RightMost is the column address of the right-most macroblock ofthe picture and / stands for division by truncation. For example, for aQCIF picture and Left equal to 3, Width becomes (10 − 3)/2 + 1 = 4 1 001Guess + 1 2 011 Guess − 1 3 00001 Guess + 2 4 00011 Guess + 2 . . . . .. . . .

-   -   Height Height of the sub-picture. The code words are assigned        similarly to Width.

Applications for the Source Coding Method

One of the major applications for a video encoding and decoding methodof the preferred embodiment is transport prioritization of subjectivelyimportant sub-pictures.

The proposed method may improve compression efficiency compared tocoding of frequent i.e. fixedly assigned slices. Rectangularsub-pictures often have smooth motion fields or consistent texture, andtherefore motion vector and INTRA coding operates better when used on arelatively homogenous sub-picture.

Independent sub-pictures can also be used for picture resolutionscalability. Assume that the same QCIF bit-stream, e.g. a multimediamessage, is transferred to two handheld devices having a differentscreen size. One supports sizes up to QCIF (176×144) and the other oneup to QQVGA (160×120). There are two conventional possibilities to fit aQCIF picture onto a QQVGA display rectangle: First, the picture can bedownscaled, but this may be computationally costly. Seconds the picturecan be cropped (8 pixels from left and right and 12 pixels from top andbottom), but the cropped pixels must be decoded anyhow. Independentsub-pictures provide yet another solution: The bit-stream could be codedso that there is a 144×112-sized sub-picture centered in the QCIFpicture. The bit-stream can be decoded for a QQVGA display rectangle sothat only the sub-picture is decoded. Consequently, 36 of the original99 macroblocks per picture do not have to be decoded.

An example on internet streaming using the preferred embodiment is nextdescribed.

Multicast Internet streaming was selected as a target application. Thebasis for the selection was that the common conditions for the low-delayInternet applications (VCEG-N79R1) could be easily applied to multicaststreaming as well.

As interactive error concealment cannot be used in large scale with IPmulticast, forward error control methods were used. The methods can beapplied in transport coding level (FEC packets, packet duplication) orin source coding level (INTRA macroblock updating). Three cases wereconsidered:

-   1. Relatively long (1 second, or 10 frames at a frame rate of 10    frames/second) initial buffering before starting playback in    clients. Reed-Solomon forward error coding used.-   2. Moderate amount (2 frames) of initial buffering before starting    playback in clients. Parity forward error coding according to RFC    2733 used.-   3. Moderate amount (2 frames) of initial buffering before starting    playback in clients. No transport-level forward error coding.

While the best results can be achieved with case 1, clients may lack therequired buffering capabilities. Furthermore, Reed-Solomon FEC packetshave not been standardized (as far as we know). Thus, results were alsoprovided for a simple parity FEC based scheme (case 2), which should beeasy enough to implement in most practical systems. However, somesystems, such as the 3GPP packet-switched streaming service (release 4),do not include support for parity FEC, and therefore case 3 was added tothe test set too.

Test Conditions

The Codecs:

The coding method of the preferred embodiment of the invention wasimplemented based on TML-8.6, a temporary version of TML-8.5 plus theerror concealment implementation (VCEG-N62). It was called as arectangular sub-picture (RSP) codec The performance of RSP codec wascompared to the conventional codec 1 (TML-86 plus region-of-interestquantization) and the conventional codec 2 (TML-86, withoutregion-of-interest quantization).

Codec Parameters:

-   -   Motion vector resolution: ⅛ pel    -   Hadamard transform: used    -   Max search range: 16    -   Number of previous frames used for inter motion search: 5    -   All the block types enabled.    -   Slice mode: Fixed number of MB per slice    -   B-frames and SP-frames: not used    -   Symbol mode: CABAC    -   Data partition: 1 partition per slice    -   Sequence header: no sequence header    -   Search range restrictions: no    -   Rate-distortion optimized mode decision: on    -   Constrained intra prediction: not used    -   Change QP: not used    -   Additional reference frame: not used

Other Conditions:

-   -   Instead of encoding 4000 frames as specified in VCEG-N79R1, the        PSNR of the decoded video is calculated for each of the 10 runs,        the average PSNR plus the best and worst cases of the 10 runs        are shown, as proposed in VCEG-M77. This method is used to show        the variation of the PSNR depending on the position of the loss        pattern files. In the simulation, the beginning loss position of        the run with order n+1 continuously follows the ending loss        position of the nth run.    -   A constant packetization overhead (40 bytes/packet) is assumed        as in VCEG-N79R1. The packetization overheads of all the        packets, including the FEC packets, are subtracted from the        available total bitrate to calculate the available video        bitrate.    -   Since no rate control strategy is implemented in current TML        software, we acquire the desired bit rates according to the bit        allocation method described in subsection 4.4.    -   As specified in VCEG-N79R1, PSNR is calculated between each and        every frame of the source sequence (at full frame rate), and the        corresponding reconstructed frame.    -   INTRA GOB updates were used instead of a macroblock mode        selection mechanism.

Test Sequence and Segmentation

The experiments were done with the Carphone sequence. The QCIF versionof the sequence was used. It was coded at a frame rate of 10 fps. Thetarget total bit-rate was 64 kbps. The number of encoded frames was 101(303 frames of the Carphone sequence, skipped by 2).

The foreground sub-picture was selected manually and covered the head ofthe fellow appearing in the series in all the pictures of a video clip.In the conventional codec 1, the area for the foreground sub-picture wasselected as the region of interest that was quantized finer than therest of the picture. A constant 64×64 foreground sub-picture was usedthroughout the sequence. The independent sub-picture coding mode was inuse.

Packetization and Forward Error Correction

In all the cases, the sizes of the RS FEC packets are assumed to equalto the largest size of the packets that were protected with the RS FECpackets. If m FEC packets are coded for each block of n video packets,the coding scheme is notated as RS(n,m). The FEC scheme is able tocorrect a loss of up to m packets (any combination of the video packetsand the FEC packets) per each block.

Case 1: 10 frames buffered with RS FEC

The RSP codec (only one foreground sub-picture):

-   -   For the intra picture, there are 4 packets: 2 packets for the        foreground sub-picture (GOB interleaving applied), 1 RS(2,1)        packet for the 2 foreground packets, and 1 packet for the        background sub-picture.    -   For inter pictures, 10 frames consist a group. For each group,        there are 10 foreground sub-picture packets, m (m is variable)        RS(10,m) packets for the foreground packets, and 10 background        sub-picture packets Note that the packetization method for the        foreground and the background is an interleaving method: the        even number of GOBs of frame n and the odd number of GOBs of        frame n+1 are in one packet, and vice versa.

The Conventional Codecs:

-   -   For the intra picture, there are 3 packets: 2 GOB interleaving        packets for the entire picture, and 1 RS(2,1) packet.    -   For inter pictures, 10 frames consist a group. For each group,        there are 20 packets, each of which contain every other GOB of a        particular frame, and m (m is variable) RS(20,m) packets.

Case 2: 2 frames buffered with parity FEC

Note that the result of the parity FEC for 2 packets is the same as theresult of RS(2,1). Therefore, to simplify documenting, the parity FEC isconsidered to be the same as RS(2,1).

The RSP codec (only one foreground sub-picture):

-   -   For the intra picture, the same as in case 1.    -   For inter pictures, 2 frames consist a group. For each group,        there are 2 foreground sub-picture packets, 1 RS(2,1) packet for        the foreground packets, and 1 background sub-picture packets.        The packetization method for the foreground and the background        is the same as in case 1.

The Conventional Codecs:

-   -   For the intra picture, the same as in case 1.    -   For inter pictures, each frame has 3 packets: 2 GOB interleaving        packets, and 1 RS(2,1) packets.

Case 3: 2 frames buffered without FEC

All the codecs use the same packetization method as in case 2. The onlydifference is that there is no FEC packet.

Bit Allocation

The bit rate is decided by several factors: the intra GOB update (IGU)rate, the FEC rate, the slicing method, and OP. (Note that predictionfrom outside the intra updated GOB should be prevented when non-GOBshaped slice is used. For GOB shaped slice, the prediction prevention isdone by the slice prediction limitation.) In the simulations, the former3 factors, if variable, are optimized by trial and error. OP is adjustedwhen other factors are fixed, as follows:

QP or QP pair for region of interest (ROI) encoding is fixed for thewhole sequence.

For the conventional codec without ROI encoding, QP is adjusted directlyto meet the available video bit rate as closely as possible.

For the RSP codec or the conventional codec with ROI encoding, the QPpair is adjusted as follows (QPf is for foreground, and QPb is forbackground):

First decide QPf. Set QPb to be the maximum (31), adjust QPf to meet theavailable video bit rate as close as possible.

Then refine QPb. Let QPf be fixed as decided above, adjust QPb to meetthe available video bit rate as close as possible.

Results

Objective Results

Only the optimized results are presented here. When optimizing the IGUrate, the FEC rate and the slicing method, no range limitations are puton them. The largest ranges decided by the codecs are used. That is, IGUrate range is from 0 to 1 GOB/frame, the FEC rate (m in RS(n,m)) is from0 (no upper limit), and the slicing is from 1 to 9 (the maximum for QCIFsequence) GOB/slice.

Results of the three cases were obtained. In the following discussion,PLR denotes the packet loss rate and IGUf and IGUb are respectively IGUof the foreground and background sub-pictures. The unit for IGU rate isGOB/frame.

The results showed that:

-   -   In each case, the proposed RSP codec has the best PSNRs for the        foreground region, and the conventional codec with ROI coding is        better than the conventional codec without ROI. In case 1, the        average PSNR of the RSP codec is 0.78 dB to 0.96 dB higher than        the convention codec with ROI coding, and 1.94 dB to 2.40 dB        higher than the conventional codec without ROI coding; In case        2, the improvements are 1.09 dB to 1.59 dB and 2.04 dB to 2.38        dB, respectively; In case 3, the improvements are respectively        0.28 dB to 1.52 dB and 1.28 dB to 1.86 dB when PLR is non-zero.    -   If relatively long sequences are buffered (case 1), proper RS        FEC can recover all packet losses in PLRs 3%, 5%, and 10%, and        most packet losses in PLR 20%.    -   In case 3 the RSP codec is better than the conventional codecs.        One reason is that the foreground region using RSP codec has        larger IGU rate. For QCIF sequence, there are 9 GOBs per frame.        Therefore IGU=1 means that the real IGU rate is 1/9. For the        foreground sub-picture (assume it has 6 lines of MBs), IGU=1        means that the real IGU rate is ⅙. In current TML software, the        maximum IGU is 1. However, from the presented results, we can        infer that larger IGU rate or other intra update method should        be developed.

Subjective Results

For each case, the sequence was chosen which had the PSNR values closestto the average ones as the representative sequence for subjectiveevaluations.

Even though snapshots do not give a complete idea of the behavior ofdifferent schemes, they capture some aspects. Snapshots of the lastdecoded picture for the 0%, 5%, and 20% packet loss rate and for theproposed coding scheme were examined.

The snapshots showed that the conventional coding scheme with a constantQP looks clearly worst in all cases. They also show that the area ofinterest in the proposed coding scheme is subjectively better than inthe other schemes. This can be seen most clearly when no FEC packets areused in the 20% loss rate case.

In general, the presented snapshots are in line with the fact thatlosses are recovered by FEC packets and INTRA GOB updates relativelysoon (as there are hardly any visible errors in the snapshots).

FIG. 6 shows a block diagram of a mobile communication device MSaccording to the preferred embodiment of the invention In the mobilecommunication device, a Master Control Unit MCU controls blocksresponsible for the mobile communication device's various functions: aRandom Access Memory RAM, a Radio Frequency part RF, a Read Only MemoryROM, video codec CODEC and a User Interface UI. The user interfacecomprises a keyboard KB, a display DP, a speaker SP and a microphone MF.The MCU is a microprocessor, or in alternative embodiments, some otherkind of processor, for example a Digital Signal Processor.Advantageously, the operating instructions of the MCU have been storedpreviously in the ROM memory. In accordance with its instructions (i.e.a computer program), the MCU uses the RF block for transmitting andreceiving data over a radio path. The video codec may be either hardwarebased or fully or partly software based, in which case the CODECcomprises computer programs for controlling the MCU to perform videoencoding and decoding functions as required. The MCU uses the RAM as itsworking memory. The mobile communication device can capture motion videoby the video camera, encode and packetize the motion video using theMCU, the RAM and CODEC based software. The RF block is then usedexchange encoded video with other parties.

FIG. 7 shows video communication system 70 comprising a plurality ofmobile communication devices MS, a mobile telecommunications network 71,the Internet 72, a video server 73 and a fixed PC connected to theInternet. The video server has a video encoder and can provide on-demandvideo streams such as weather forecasts or news.

The preferred embodiment of the invention is based on a region-basedcoding scheme. Unlike MPEG-4 video, it does not require any complicatedprocessing of arbitrary shaped regions in video encoding and decodingand therefore it is well applicable to handheld devices. The preferredembodiment of the invention provides a robust video coding and decodingtool to enable transport prioritization and to achieve a subjectivelybetter picture quality in error-prone video communication systems.

The preferred embodiment may be applied in various contexts, for examplein the context of the ITU-T H.26L video coding standard. Particularimplementations and embodiments of the invention have been described. Itis clear to a person skilled in the art that the invention is notrestricted to details of the embodiments presented above, but that itcan be implemented in other embodiments using equivalent means withoutdeviating from the characteristics of the invention. The scope of theinvention is only restricted by the attached patent claims.

Abbreviations:

AVO Audio-Visual Object

BR Background Region

CABAC Context-based Adaptive Binary Arithmetic Coding

DCT Discrete Cosine Transform

DPL Data Partitioning Layer

FEC Forward Error Correction

FR Foreground Region

GOB Group Of Blocks

IGU intra GOB update

ITU International Telecommunication Union

MB Macroblock

MPEG Moving Picture Expert Group

NAL Network Adaptation Layer

QCIF Quarter Common Intermediate Format

QP Quantisation Parameter

QPb Quantisation Parameter for background

QPf Quantisation Parameter for foreground

QQVGA video format with 160×120 pixels

ROI Region Of Interest

RS Reed-Solomon

RSP Rectangular Sub-Picture

RTP Real-time Transport Protocol

SEI Supplemental Enhancement Information

SP Sub-Picture

TML Test Model Long-term

UVLC Universal Variable Length Code

VCL Video Coding Layer

VO Video Object

VOL Video Object Layer

VOP Video Object Plane

YUV three colour components

1. A method comprising: encoding a first picture, dividing a secondpicture into a set of regular shaped coding blocks having apredetermined alignment in relation to the area of the picture, eachcoding block corresponding to at least one group of elementary codingelements; determining at least one shape within the second picture;selecting at least one subset of the coding blocks defining at least onearea covering the at least one determined shape; determining as at leastone separate coding object the selected at least one subset of thecoding blocks; determining as a background object the subset of thecoding blocks that corresponds to the part of the second picture thatexcludes the at least one separate coding object; encoding the at leastone separate coding object; encoding as one coding object the backgroundobject; and predicting at least one coding block of the backgroundobject from the first picture.
 2. A method according to claim 1, whereinthe background object is coded using the at least one separate codingobject.
 3. A method according to claim 1, wherein encoding as one codingobject the background coding object further comprises defining codingslices in a scan-order so that the slices are composed by consecutivecoding blocks skipping those basic coding objects which are included inthe at least one separate coding object.
 4. A method according to claim1, wherein encoding the at least one separate coding object furthercomprises defining within each separate coding object coding slices in ascan-order so that the slices are composed in the scan-order of codingblocks included in the at least one separate coding object.
 5. A methodaccording to claim 1, wherein the area covering the at least onedetermined shape is a rectangular area, whereby square is one subset ofrectangles.
 6. A method according to claim 1, wherein the separatecoding objects are defined in a descending order of subjectiveimportance.
 7. A method according to claim 1, wherein encoding the atleast one separate coding object is independent of encoding as onecoding object the background object so as to inhibit error propagationinto the at least one separate coding object.
 8. A method according toclaim 1, wherein the method further comprises assigning a differentidentifier to the at least one separate coding object for correlatingeach of the at least one separate coding object and correspondingcharacteristics.
 9. A method comprising: storing a first picture andsubsequent second picture both coded by a set of coding blocks, eachcoding block corresponding to at least one group of the elementarycoding elements and the coding blocks having a predetermined alignmentin relation to the area of the picture in question; decoding the secondpicture in a process comprising: determining at least one separatecoding object of the second picture corresponding to at least one subsetof the coding blocks defining at least one part of the second;determining as a background object the subset of the coding blocks thatcorresponds to the part of the second picture that excludes the at leastone separate coding object of the second picture; decoding the at leastone separate coding object; and decoding the background object bypredicting at least one coding block of the background object from thefirst picture.
 10. A method according to claim 9, further comprisingdetermining video decoding slices for the background object by forming adecoding slice of consecutive coding blocks and skipping the codingblocks which belong to any of the separate coding object;
 11. A methodaccording to claim 9, wherein the video decoding of the at least oneseparate coding object is independent of the video decoding of thebackground object.
 12. A method according to claim 9, wherein the atleast one separate object corresponds to at least one foreground region.13. A video decoder comprising: a memory configured to store a firstpicture and a subsequent second picture both coded by a set of codingblocks, each coding block corresponding to at least one group of theelementary coding elements and the coding blocks having a predeterminedalignment in relation to the area of the picture in question; and aprocessor configured to decode the second picture by: determining atleast one separate coding object of the second picture corresponding toat least one subset of the coding blocks defining at least one part ofthe second picture; determining as a background object the subset of thecoding blocks that corresponds to the part of the second picture thatexcludes the at least one separate coding object of the second picture;decoding the at least one separate coding object; and decoding thebackground object by predicting at least one coding block of thebackground object from the first picture.
 14. A video decoder accordingto claim 13, wherein the processor is further configured to determinevideo decoding slices for the background object and form a decodingslice of consecutive coding blocks and skipping the coding blocks whichbelong to any of the separate coding object of the second picture.
 15. Avideo decoder according to claim 13, wherein the video decoding of theat least one separate coding object is independent of the video decodingof the background object.
 16. A video decoder according to claim 13,wherein the at least one separate object corresponds to at least oneforeground region.
 17. A video encoder comprising a processor configuredto: encode a first picture, divide a second picture into a set ofregular shaped coding blocks having a predetermined alignment inrelation to the area of the picture, each coding block corresponding toat least one group of elementary coding elements; determine at least oneshape within the second picture; select at least one subset of thecoding blocks defining at least one area covering the at least onedetermined shape; determine as at least one separate coding object theselected at least one subset of the coding blocks; determine as abackground object the subset of the coding blocks that corresponds tothe part of the second picture that excludes the at least one separatecoding object; encode the at least one separate coding object; encode asone coding object the background object; and predict at least one codingblock of the background object from the first picture.
 18. A videoencoder according to claim 17, wherein the background coding object iscoded using the at least one separate coding object.
 19. A video encoderaccording to claim 17, wherein encoding of the background coding objectfurther comprises defining coding slices in a scan-order so that theslices are composed by consecutive coding blocks skipping those basiccoding objects which are included in the at least one separate codingobject.
 20. A video encoder according to claim 19, further configured toencode the background coding object as one unitary coding object.
 21. Avideo encoder according to claim 19, wherein encoding of the at leastone separate coding object further comprises defining within eachseparate coding object coding slices in a scan-order so that the slicesare composed in the scan-order of coding blocks included in the at leastone separate coding object.
 22. A video encoder according to claim 19,wherein the area covering the at least one determined shape is arectangular area, whereby square is one subset of rectangles.
 23. Avideo encoder according to claim 19, wherein the separate coding objectsare defined in a descending order of subjective importance.
 24. A videoencoder according to claim 19, wherein encoding of the at least oneseparate coding object is independent of encoding as one coding objectthe background object so as to inhibit error propagation into the atleast one separate coding object.
 25. A video encoder according to claim19, further configured to assign a different identifier to the at leastone separate coding object for correlating each of the at least oneseparate coding object and corresponding characteristics.
 26. A computerprogram product comprising computer executable program means embodied ona computer readable medium, the program produce comprising: computerexecutable program code for encoding a first picture, computerexecutable program code for dividing a second picture into a set ofregular shaped coding blocks having a predetermined alignment inrelation to the area of the picture, each coding block corresponding toat least one group of elementary coding elements; computer executableprogram code for determining at least one shape within the secondpicture; computer executable program code for selecting at least onesubset of the coding blocks defining at least one area covering the atleast one determined shape; computer executable program code fordetermining as at least one separate coding object the selected at leastone subset of the coding blocks; computer executable program code fordetermining as a background object the subset of the coding blocks thatcorresponds to the part of the second picture that excludes the at leastone separate coding object; computer executable program code forencoding the at least one separate coding object; computer executableprogram code for encoding as one coding object the background object;and computer executable program code for predicting at least one codingblock of the background object from the first picture.
 27. A computerprogram product comprising computer executable program means embodied ona computer readable medium, the program produce comprising: computerexecutable program code for storing a first picture and subsequentsecond picture both coded by a set of coding blocks, each coding blockcorresponding to at least one group of the elementary coding elementsand the coding blocks having a predetermined alignment in relation tothe area of the picture in question; computer executable program codefor decoding the second picture in a process comprising: computerexecutable program code for determining at least one separate codingobject of the second picture corresponding to at least one subset of thecoding blocks defining at least one part of the second; computerexecutable program code for determining as a background object thesubset of the coding blocks that corresponds to the part of the secondpicture that excludes the at least one separate coding object of thesecond picture; computer executable program code for decoding the atleast one separate coding object; and computer executable program codefor decoding the background object by predicting at least one codingblock of the background object from the first picture.